Self-managed collections: Off-heap memory management for scalable query-dominated collections
نویسندگان
چکیده
Explosive growth in DRAM capacities and the emergence of language-integrated query enable a new class of managed applications that perform complex query processing on huge volumes of data stored as collections of objects in the memory space of the application. While more flexible in terms of schema design and application development, this approach typically experiences sub-par query execution performance when compared to specialized systems like DBMS. To address this issue, we propose self-managed collections, which utilize off-heap memory management and dynamic query compilation to improve the performance of querying managed data through language-integrated query. We evaluate self-managed collections using both microbenchmarks and enumeration-heavy queries from the TPC-H business intelligence benchmark. Our results show that self-managed collections outperform ordinary managed collections in both query processing and memory management by up to an order of magnitude and even outperform an optimized inmemory columnar database system for the vast majority of queries.
منابع مشابه
Efficient query processing in managed runtimes
This thesis presents strategies to improve the query evaluation performance over huge volumes of relational-like data that is stored in the memory space of managed applications. Storing and processing application data in the memory space of managed applications is motivated by the convergence of two recent trends in data management. First, dropping DRAM prices have led to memory capacities that...
متن کاملThe SCAPE Planning and Watch suite Supporting the preservation lifecycle in repositories
Increasingly, content owners are operating repositories with large, heterogeneous collections. The responsibility to provide access to these collections on the long term requires preservation processes such as planning, monitoring, and actual preservation operations such as migration and quality assurance, which have to be managed and integrated with the repositories. This article presents a su...
متن کاملCode Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the convergence of two trends in data management: in-memory database systems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mi...
متن کاملDetecting memory leaks in managed languages with Cork
A memory leak in a managed program occurs when the program inadvertently maintains references to objects that it no longer needs. Memory leaks cause systematic heap growth which degrades performance and results in program crashes after perhaps days or weeks of execution. Prior approaches for detecting memory leaks rely on heap differencing or detailed object statistics which store state proport...
متن کاملThesis Proposal: Regional Garbage Collection
The ongoing shift from 32-bit to 64-bit processor environments forces garbage collectors to cope with the larger heaps made possible by the increased address space. On 32-bit machines, generational collectors that occasionally pause to collect the entire heap work well enough for many applications, but that paradigm does not scale up because collection pauses that take time proportional to the ...
متن کامل